Search | WHO COVID-19 Research Database

DecentTree: Scalable Neighbour-Joining for the Genomic Era (preprint)

Weiwen Wang; James Barbetti; Thomas Wong; Bryan Thornlow; Russ Corbett-Detig; Yatish Turakhia; Robert Lanfear; Bui Quang Minh.

biorxiv; 2022.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2022.04.10.487712

ABSTRACT

Summary Neighbour-Joining is one of the most widely used distance-based phylogenetic inference methods. However, current implementations do not scale well for datasets with more than 10,000 sequences. Given the increasing pace of generating new sequence data, particularly in outbreaks of emerging diseases, and the already enormous existing databases of sequence data for which NJ is a useful approach, new implementations of existing methods are warranted. Here we present DecentTree, which provides highly optimised and parallel implementations of Neighbour-Joining and several of its variants. DecentTree is designed as a stand-alone application and a header-only library easily integrated with other phylogenetic software (e.g. it is integral in the popular IQ-TREE software). We show that DecentTree shows similar or improved performance over existing software (BIONJ, Quicktree, FastME, and RapidNJ), especially for handling very large alignments. For example, DecentTree is up to 6-fold faster than the fastest existing Neighbour-Joining software (e.g. RapidNJ) when generating a tree of 64,000 SARS-CoV-2 genomes. Availability and implementation DecentTree is open source and freely available at https://github.com/iqtree/decenttree . Contact Minh Bui: m.bui@anu.edu.au ; Robert Lanfear: rob.lanfear@anu.edu.au Supplementary information Supplementary data are available at Bioinformatics online.

Subject(s)

Epilepsy, Frontal Lobe

Maximum likelihood pandemic-scale phylogenetics (preprint)

Nicola De Maio; Prabhav Kalaghatgi; Yatish Turakhia; Russell Corbett-Detig; Bui Quang Minh; Nick Goldman.

biorxiv; 2022.

Preprint in English | bioRxiv | ID: ppzbmed-10.1101.2022.03.22.485312

ABSTRACT

Genomic data plays an essential role in the study of transmissible disease, as exemplified by its current use in identifying and tracking the spread of novel SARS-CoV-2 variants. However, with the increase in size of genomic epidemiological datasets, their phylogenetic analyses become increasingly impractical due to high computational demand. In particular, while maximum likelihood methods are go-to tools for phylogenetic inference, the scale of datasets from the ongoing pandemic has made apparent the urgent need for more computationally efficient approaches. Here we propose a new likelihood-based phylogenetic framework that greatly reduces both the memory and time demand of popular maximum likelihood approaches when analysing many closely related genomes, as in the scenario of SARS-CoV-2 genome data and more generally throughout genomic epidemiology. To achieve this, we rewrite the classical Felsenstein pruning algorithm so that we can infer phylogenetic trees on at least 10 times larger datasets with higher accuracy than existing maximum likelihood methods. Our algorithms provide a powerful framework for maximum-likelihood genomic epidemiology and could facilitate similarly groundbreaking applications in Bayesian phylogenomic analyses as well.

ABSTRACT

Subject(s)

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL